Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collect json paths in indexing #2231

Merged
merged 2 commits into from
Nov 1, 2023
Merged

collect json paths in indexing #2231

merged 2 commits into from
Nov 1, 2023

Conversation

PSeitz
Copy link
Contributor

@PSeitz PSeitz commented Oct 27, 2023

Partially implementing #2015 and preparation for #2215

Next step would be to remove the JsonTermWriter and use the same technique in fast field json paths

@PSeitz PSeitz force-pushed the path_collector branch 5 times, most recently from 0813b35 to f0f121e Compare October 27, 2023 10:47
///
/// # Safety
/// Any call to get or mutate_or_create after this call is undefined behavior.
pub unsafe fn iter_mut_keys<F: Fn(&mut [u8])>(&mut self, cb: F) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do the remap here?

COuld that be done in a safe not mut way upon serialization? (in serialize postings)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| Why do we need to do the remap here?

We need the keys in lexicographical order, for the serialization.

I think I can change it to consume the hashmap and remove the unsafe

@fulmicoton
Copy link
Collaborator

fulmicoton commented Oct 31, 2023

This PR surely saves memory, but i have no clue what impact on indexing throughput is.
Did you measure that?

(Of course, the next step is to push the ids at a higher level to avoid building the json path construction, but you were right to not ship that in that PR.)

@PSeitz
Copy link
Contributor Author

PSeitz commented Nov 1, 2023

This PR surely saves memory, but i have no clue what impact on indexing throughput is. Did you measure that?

(Of course, the next step is to push the ids at a higher level to avoid building the json path construction, but you were right to not ship that in that PR.)

I didn't measure a lot, but it seemed relatively unchanged. The extra look-ups seem to be set of by the less copying and faster hash calculation

@PSeitz PSeitz merged commit 28dd6b6 into main Nov 1, 2023
@PSeitz PSeitz deleted the path_collector branch November 1, 2023 10:25
PSeitz added a commit that referenced this pull request Apr 10, 2024
* collect json paths in indexing

* remove unsafe iter_mut_keys
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants